Search CORE

12 research outputs found

CrypTFlow2: Practical 2-Party Secure Inference

Author: Agrawal Nitin
Ball Marshall
Beaver Donald
Blakley G. R.
Boemer Fabian
Boemer Fabian
Brakerski Zvika
Brassard Gilles
Chandran Nishanth
Chi-Chih Yao Andrew
Couteau Geoffroy
Dathathri Roshan
Demmler Daniel
Dessouky Ghada
Escudero Daniel
Garay Juan A.
Gilad-Bachrach Ran
Goldreich Oded
Gueron Shay
Guo C.
Hazay Carmit
He Kaiming
Huang Gao
Hubara Itay
Ishai Yuval
Jacob Benoit
Juvekar Chiraag
Kolesnikov Vladimir
Kumar Nishant
Liu Jian
Mishra Pratyush
Mohassel Payman
Nagel Markus
Niklas Bü
Riazi M. Sadegh
Riazi M. Sadegh
Rouhani Bita Darvish
Wagh Sameer
Zheng Wenting
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 18/08/2020
Field of study

We present CrypTFlow2, a cryptographic framework for secure inference over realistic Deep Neural Networks (DNNs) using secure 2-party computation. CrypTFlow2 protocols are both correct -- i.e., their outputs are bitwise equivalent to the cleartext execution -- and efficient -- they outperform the state-of-the-art protocols in both latency and scale. At the core of CrypTFlow2, we have new 2PC protocols for secure comparison and division, designed carefully to balance round and communication complexity for secure inference tasks. Using CrypTFlow2, we present the first secure inference over ImageNet-scale DNNs like ResNet50 and DenseNet121. These DNNs are at least an order of magnitude larger than those considered in the prior work of 2-party DNN inference. Even on the benchmarks considered by prior work, CrypTFlow2 requires an order of magnitude less communication and 20x-30x less time than the state-of-the-art

Crossref

Cryptology ePrint Archive

Recommended from our members

Compiler and runtime systems for homomorphic encryption and graph processing on distributed and heterogeneous architectures

Author: Dathathri Roshan
Publication venue
Publication date: 09/10/2020
Field of study

Distributed and heterogeneous architectures are tedious to program because devices such as CPUs, GPUs, and FPGAs provide different programming abstractions and may have disjoint memories, even if they are on the same machine. In this thesis, I present compiler and runtime systems that make it easier to develop efficient programs for privacy-preserving computation and graph processing applications on such architectures. Fully Homomorphic Encryption (FHE) refers to a set of encryption schemes that allow computations on encrypted data without requiring a secret key. Recent cryptographic advances have pushed FHE into the realm of practical applications. However, programming these applications remains a huge challenge, as it requires cryptographic domain expertise to ensure correctness, security, and performance. This thesis introduces a domain-specific compiler for fully-homomorphic deep neural network (DNN) inferencing as well as a general-purpose language and compiler for fully-homomorphic computation: 1. I present CHET, a domain-specific optimizing compiler, that is designed to make the task of programming DNN inference applications using FHE easier. CHET automates many laborious and error prone programming tasks including encryption parameter selection to guarantee security and accuracy of the computation, determining efficient data layouts, and performing scheme-specific optimizations. Our evaluation of CHET on a collection of popular DNNs shows that CHET-generated programs outperform expert-tuned ones by an order of magnitude. 2. I present a new FHE language called Encrypted Vector Arithmetic (EVA), which includes an optimizing compiler that generates correct and secure FHE programs, while hiding all the complexities of the target FHE scheme. Bolstered by our optimizing compiler, programmers can develop efficient general-purpose FHE applications directly in EVA. EVA is designed to also work as an intermediate representation that can be a target for compiling higher-level domain-specific languages. To demonstrate this, we have re-targeted CHET onto EVA. Due to the novel optimizations in EVA, its programs are on average ~ 5.3x faster than those generated by the unmodified version of CHET. These languages and compilers enable a wider adoption of FHE. Applications in several areas like machine learning, bioinformatics, and security need to process and analyze very large graphs. Distributed clusters are essential in processing such graphs in reasonable time. I present a novel approach to building distributed graph analytics systems that exploits heterogeneity in processor types, partitioning policies, and programming models. The key to this approach is Gluon, a domain-specific communication-optimizing substrate. Programmers write applications in a shared-memory programming system of their choice and interface these applications with Gluon using a lightweight API. Gluon enables these programs to run on heterogeneous clusters in the bulk-synchronous parallel (BSP) model and optimizes communication in a novel way by exploiting structural and temporal invariants of graph partitioning policies. We also extend Gluon to support lock-free, non-blocking, bulk-asynchronous execution by introducing the bulk-asynchronous parallel (BASP) model. Our experiments were done on CPU clusters with up to 256 multi-core, multi-socket hosts and on multi-GPU clusters with up to 64 GPUs. The communication optimizations in Gluon improve end-to-end application execution time by ~ 2.6x on the average. Gluon's BASP-style execution is on average ~ 1.5x faster than its BSP-style execution for graph applications on real-world large-diameter graphs at scale. The D-Galois and D-IrGL systems built using Gluon scale well and are faster than Gemini, the state-of-the-art distributed CPU-only graph analytics system, by factors of ~ 3.9x and ~ 4.9x on average using distributed CPUs and distributed GPUs respectively. The Gluon-based D-IrGL system for distributed GPUs is also on average ~ 12x faster than Lux, the only other distributed GPU-only graph analytics system. The Gluon-based D-IrGL system was one of the first distributed GPU graph analytics systems and is the only asynchronous one.Computer Science

Texas ScholarWorks

Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory

Author: Bondhugula Uday
Dathathri Roshan
Ramashekar Thejas
Reddy Chandan
Publication venue: IEEE
Publication date
Field of study

Programming for parallel architectures that do not have a shared address space is extremely difficult due to the need for explicit communication between memories of different compute devices. A heterogeneous system with CPUs and multiple GPUs, or a distributed-memory cluster are examples of such systems. Past works that try to automate data movement for distributed-memory architectures can lead to excessive redundant communication. In this paper, we propose an automatic data movement scheme that minimizes the volume of communication between compute devices in heterogeneous and distributed-memory systems. We show that by partitioning data dependences in a particular non-trivial way, one can generate data movement code that results in the minimum volume for a vast majority of cases. The techniques are applicable to any sequence of affine loop nests and works on top of any choice of loop transformations, parallelization, and computation placement. The data movement code generated minimizes the volume of communication for a particular configuration of these. We use a combination of powerful static analyses relying on the polyhedral compiler framework and lightweight runtime routines they generate, to build a source-to-source transformation tool that automatically generates communication code. We demonstrate that the tool is scalable and leads to substantial gains in efficiency. On a heterogeneous system, the communication volume is reduced by a factor of 11X to 83X over state-of-the-art, translating into a mean execution time speedup of 1.53X. On a distributed-memory cluster, our scheme reduces the communication volume by a factor of 1.4X to 63.5X over state-of-the-art, resulting in a mean speedup of 1.55X. In addition, our scheme yields a mean speedup of 2.19X over hand-optimized UPC codes

Open Access Repository of IISc Research Publications

Sandslash: A Two-Level Framework for Efficient Graph Pattern Mining

Author: Chen Xuhao
Dathathri Roshan
Gill Gurbinder
Hoang Loc
Pingali Keshav
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/10/2022
Field of study

DSpace@MIT

Low-latency graph streaming using compressed purely-functional trees

Author: Acar Umut A.
Beamer Scott
Ben-David Naama
Blandford Daniel K.
Blelloch Guy E.
Bronson Nathan
Busato F.
Daniel
Dathathri Roshan
Dhulipala Laxman
Gonzalez Joseph E
Iyer Anand
Iyer Anand Padmanabha
Khurana Udayan
Kumar P.
Kumar Pradeep
Low Yucheng
Prabhakaran Vijayan
Sengupta Dipanjan
Shun Julian
Wang Kai
Winter Martin
Yin Chunxing
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/04/2019
Field of study

There has been a growing interest in the graph-streaming setting where a continuous stream of graph updates is mixed with graph queries. In principle, purely-functional trees are an ideal fit for this setting as they enable safe parallelism, lightweight snapshots, and strict serializability for queries. However, directly using them for graph processing leads to significant space overhead and poor cache locality. This paper presents C -trees, a compressed purely-functional search tree data structure that significantly improves on the space usage and locality of purely-functional trees. We design theoretically-efficient and practical algorithms for performing batch updates to C -trees, and also show that we can store massive dynamic real-world graphs using only a few bytes per edge, thereby achieving space usage close to that of the best static graph processing frameworks. To study the applicability of our data structure, we designed Aspen, a graph-streaming framework that extends the interface of Ligra with operations for updating graphs. We show that Aspen is faster than two state-of-the-art graph-streaming systems, Stinger and LLAMA, while requiring less memory, and is competitive in performance with the state-of-the-art static graph frameworks, Galois, GAP, and Ligra+. With Aspen, we are able to efficiently process the largest publicly-available graph with over two hundred billion edges in the graph-streaming setting using a single commodity multicore server with 1TB of memory

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Recommended from our members

Evaluation of Graph Analytics Frameworks Using the GAP Benchmark Suite

Author: Azad Ariful
Aznaveh Mohsen Mahmoudi
Beamer Scott
Blanco Mark
Chen Jinhao
D'Alessandro Luke
Dathathri Roshan
Davis Tim
Deweese Kevin
Firoz Jesun
Gabb Henry A
Gill Gurbinder
Hegyi Balint
Kolodziej Scott
Low Tze Meng
Lumsdaine Andrew
Manlaibaatar Tugsbayasgalan
Mattson Timothy G
McMillan Scott
Peri Ramesh
Pingali Keshav
Sridhar Upasana
Szarnyas Gabor
Zhang Yongzhe
Zhang Yunming
Publication venue: eScholarship, University of California
Publication date: 30/10/2020
Field of study

Graphs play a key role in data analytics. Graphs and the software systems used to work with them are highly diverse. Algorithms interact with hardware in different ways and which graph solution works best on a given platform changes with the structure of the graph. This makes it difficult to decide which graph programming framework is the best for a given situation. In this paper, we try to make sense of this diverse landscape. We evaluate five different frameworks for graph analytics: SuiteS-parse GraphBLAS, Galois, the NWGraph library, the Graph Kernel Collection, and GraphIt. We use the GAP Benchmark Suite to evaluate each framework. GAP consists of 30 tests: six graph algorithms (breadth-first search, single-source shortest path, PageRank, betweenness centrality, connected components, and triangle counting) on five graphs. The GAP Benchmark Suite includes high-performance reference implementations to provide a performance baseline for comparison. Our results show the relative strengths of each framework, but also serve as a case study for the challenges of establishing objective measures for comparing graph frameworks

eScholarship - University of California